With the output shown in Figure 16-4, where the intercept (a) is 76.9 and the slope (b) is 0.487, you
can write the equation of the fitted straight line like this: SBP = 76.9 + 0.487 Weight.
Then you can use this equation to predict someone’s SBP if you know their weight. So, if a person
weighs 100 kilograms, you can estimate that that person’s SBP will be around
,
which is
, or about 125.6 mmHg. Your prediction probably won’t be exactly on the nose,
but it should be better than not using a predictive model and just guessing.
How far off will your prediction be? The residual SE provides a unit of measurement to answer this
question. As we explain in the earlier section “Summary statistics for the residuals,” the residual SE
indicates how much the individual points tend to scatter above and below the fitted line. For the SBP
example, this number is
, so you can expect your prediction to be within about
mmHg most
of the time.
Recognizing What Can Go Wrong with Straight-
Line Regression
Fitting a straight line to a set of data is a relatively simple task, but you still have to be careful. A
computer program does whatever you tell it to, even if it’s something you shouldn’t do.
Those new to straight-line regression may slip up in the following ways:
Fitting a straight line to curved data: Examining the pattern of residuals in the residuals versus
fitted chart in Figure 16-5 can let you know if you have this problem.
Ignoring outliers in the data: Outliers — especially those in the corners of a scatterplot like the
one in Figure 16-3 — can mess up all the classical statistical analyses, and regression is no
exception. One or two data points that are way off the main trend of the points will drag the fitted
line away from the other points. That’s because the strength with which each point tugs at the fitted
line is proportionate to the square of its distance from the line, and outliers have a lot of distance,
so they have a strong influence.
Always look at a scatter plot of your data to make sure outliers aren’t present. Examine the
residuals to ensure they are distributed normally above and below the fitted line.
Calculating the Sample Size You Need
To estimate how many data points you need for a regression analysis, you need to first ask yourself
why you’re doing the regression in the first place.
Do you want to show that the two variables are statistically significantly associated? If so,
you want to calculate the sample size required to achieve a certain statistical power for the